AITopics | reproducible machine

Collaborating Authors

reproducible machine

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

NHANES-GCP: Leveraging the Google Cloud Platform and BigQuery ML for reproducible machine learning with data from the National Health and Nutrition Examination Survey

Katz, B. Ross, Khan, Abdul, York-Winegar, James, Titus, Alexander J.

arXiv.org Artificial IntelligenceJan-12-2024

Summary: NHANES, the National Health and Nutrition Examination Survey, is a program of studies led by the Centers for Disease Control and Prevention (CDC) designed to assess the health and nutritional status of adults and children in the United States (U.S.). NHANES data is frequently used by biostatisticians and clinical scientists to study health trends across the U.S., but every analysis requires extensive data management and cleaning before use and this repetitive data engineering collectively costs valuable research time and decreases the reproducibility of analyses. Here, we introduce NHANES-GCP, a Cloud Development Kit for Terraform (CDKTF) Infrastructure-as-Code (IaC) and Data Build Tool (dbt) resources built on the Google Cloud Platform (GCP) that automates the data engineering and management aspects of working with NHANES data. With current GCP pricing, NHANES-GCP costs less than $2 to run and less than $15/yr of ongoing costs for hosting the NHANES data, all while providing researchers with clean data tables that can readily be integrated for large-scale analyses. We provide examples of leveraging BigQuery ML to carry out the process of selecting data, integrating data, training machine learning and statistical models, and generating results all from a single SQL-like query. NHANES-GCP is designed to enhance the reproducibility of analyses and create a well-engineered NHANES data resource for statistics, machine learning, and fine-tuning Large Language Models (LLMs). Availability and implementation" NHANES-GCP is available at https://github.com/In-Vivo-Group/NHANES-GCP

bigquery ml, nhane data, nhane-gcp, (12 more...)

arXiv.org Artificial Intelligence

2401.06967

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.15)
North America > United States > California > Alameda County > Berkeley (0.04)

Genre: Research Report > New Finding (0.36)

Industry:

Information Technology > Services (1.00)
Health & Medicine > Public Health (1.00)
Education > Health & Safety > School Nutrition (0.87)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Cloud Computing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.57)

Add feedback

Moving towards reproducible machine learning - Nature Computational Science

#artificialintelligenceOct-12-2021, 21:40:27 GMT

An important step when constructing a model is the collection and selection of the datasets, as the quality of the model greatly depends on the quality and characteristics of the data. The data collection process needs to be properly discussed and reported, as there can be biases (intentional and/or unintentional) with regards to the selected data sources. Any identified biases and attempts to mitigate them should also be properly discussed, so that other researchers can be aware of the limitations when using the reported models. If synthetic data is used, the data generation process, including any assumptions that are considered, needs to be described in detail. Raw datasets are in fact rarely used, since they may have several inconsistencies, errors, and outliers that can ultimately impact the quality of the model. In addition, data might need to be converted to a specific format and representation in order to be used for a specific model.

dataset, nature computational science, reproducible machine, (2 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.71)

Add feedback

In Search of a Common Deep Learning Stack

#artificialintelligenceJan-27-2019, 20:52:21 GMT

Web serving had the LAMP stack, and big data had its SMACK stack. But when it comes to deep learning, the technology gods have yet to give us a standard suite of tools and technologies that are universally accepted. The idea of a common "stack" upon which developers build – and administrators run -- applications has become popular in recent years. Blessed with a multitude of competing options, developers can be fearful of picking the "wrong" tools and technologies and being left on the dark side of a forked project. Administrators tasked with keeping the creations of developers running similarly are afraid of inheriting a technological albatross that weighs them down.

artificial intelligence, machine learning, paperspace, (14 more...)

#artificialintelligence

Country: North America > United States > New York (0.15)

Industry: Information Technology (0.32)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Reproducible machine learning with PyTorch and Quilt

#artificialintelligenceJul-17-2018, 20:26:06 GMT

In this article, we'll train a PyTorch model to perform super-resolution imaging, a technique for gracefully upscaling images. Super-resolution imaging (right) infers pixel values from a lower-resolution image (left). Machine learning projects typically begin by acquiring data, cleaning the data, and converting the data into model-native formats. Such manual data pipelines are tedious to create and difficult to reproduce over time, across collaborators, and across machines. Moreover, trained models are often stored haphazardly, without version control.

artificial intelligence, data package, machine learning, (19 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.65)

Add feedback